AAAI.2023 - Safe and Robust AI | Cool Papers

#1 Formally Verified SAT-Based AI Planning [PDF] [Copy] [Kimi]

Authors: Mohammad Abdulaziz ; Friedrich Kurz

We present an executable formally verified SAT encoding of ground classical AI planning problems. We use the theorem prover Isabelle/HOL to perform the verification. We experimentally test the verified encoding and show that it can be used for reasonably sized standard planning benchmarks. We also use it as a reference to test a state-of-the-art SAT-based planner, showing that it sometimes falsely claims that problems have no solutions of certain lengths.

#2 Shielding in Resource-Constrained Goal POMDPs [PDF] [Copy] [Kimi¹]

Authors: Michal Ajdarów ; Šimon Brlej ; Petr Novotný

We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by the agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call resource-constrained goal optimization (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a shield for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.

#3 Implicit Bilevel Optimization: Differentiating through Bilevel Optimization Programming [PDF] [Copy] [Kimi¹]

Author: Francesco Alesiani

Bilevel Optimization Programming is used to model complex and conflicting interactions between agents, for example in Robust AI or Privacy preserving AI. Integrating bilevel mathematical programming within deep learning is thus an essential objective for the Machine Learning community. Previously proposed approaches only consider single-level programming. In this paper, we extend existing single-level optimization programming approaches and thus propose Differentiating through Bilevel Optimization Programming (BiGrad) for end-to-end learning of models that use Bilevel Programming as a layer. BiGrad has wide applicability and can be used in modern machine learning frameworks. BiGrad is applicable to both continuous and combinatorial Bilevel optimization problems. We describe a class of gradient estimators for the combinatorial case which reduces the requirements in terms of computation complexity; for the case of the continuous variable, the gradient computation takes advantage of the push-back approach (i.e. vector-jacobian product) for an efficient implementation. Experiments show that the BiGrad successfully extends existing single-level approaches to Bilevel Programming.

#4 Query-Based Hard-Image Retrieval for Object Detection at Test Time [PDF] [Copy] [Kimi]

Authors: Edward Ayers ; Jonathan Sadeghi ; John Redford ; Romain Mueller ; Puneet K. Dokania

There is a longstanding interest in capturing the error behaviour of object detectors by finding images where their performance is likely to be unsatisfactory. In real-world applications such as autonomous driving, it is also crucial to characterise potential failures beyond simple requirements of detection performance. For example, a missed detection of a pedestrian close to an ego vehicle will generally require closer inspection than a missed detection of a car in the distance. The problem of predicting such potential failures at test time has largely been overlooked in the literature and conventional approaches based on detection uncertainty fall short in that they are agnostic to such fine-grained characterisation of errors. In this work, we propose to reformulate the problem of finding "hard" images as a query-based hard image retrieval task, where queries are specific definitions of "hardness", and offer a simple and intuitive method that can solve this task for a large family of queries. Our method is entirely post-hoc, does not require ground-truth annotations, is independent of the choice of a detector, and relies on an efficient Monte Carlo estimation that uses a simple stochastic model in place of the ground-truth. We show experimentally that it can be applied successfully to a wide variety of queries for which it can reliably identify hard images for a given detector without any labelled data. We provide results on ranking and classification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN, and Cascade Mask-RCNN object detectors. The code for this project is available at https://github.com/fiveai/hardest.

#5 Probabilities Are Not Enough: Formal Controller Synthesis for Stochastic Dynamical Models with Epistemic Uncertainty [PDF] [Copy] [Kimi]

Authors: Thom Badings ; Licio Romao ; Alessandro Abate ; Nils Jansen

Capturing uncertainty in models of complex dynamical systems is crucial to designing safe controllers. Stochastic noise causes aleatoric uncertainty, whereas imprecise knowledge of model parameters leads to epistemic uncertainty. Several approaches use formal abstractions to synthesize policies that satisfy temporal specifications related to safety and reachability. However, the underlying models exclusively capture aleatoric but not epistemic uncertainty, and thus require that model parameters are known precisely. Our contribution to overcoming this restriction is a novel abstraction-based controller synthesis method for continuous-state models with stochastic noise and uncertain parameters. By sampling techniques and robust analysis, we capture both aleatoric and epistemic uncertainty, with a user-specified confidence level, in the transition probability intervals of a so-called interval Markov decision process (iMDP). We synthesize an optimal policy on this iMDP, which translates (with the specified confidence level) to a feedback controller for the continuous model with the same performance guarantees. Our experimental benchmarks confirm that accounting for epistemic uncertainty leads to controllers that are more robust against variations in parameter values.

#6 Accelerating Inverse Learning via Intelligent Localization with Exploratory Sampling [PDF] [Copy] [Kimi]

Authors: Sirui Bi ; Victor Fung ; Jiaxin Zhang

In the scope of "AI for Science", solving inverse problems is a longstanding challenge in materials and drug discovery, where the goal is to determine the hidden structures given a set of desirable properties. Deep generative models are recently proposed to solve inverse problems, but these are currently struggling in expensive forward operators, precisely localizing the exact solutions and fully exploring the parameter spaces without missing solutions. In this work, we propose a novel approach (called iPage) to accelerate the inverse learning process by leveraging probabilistic inference from deep invertible models and deterministic optimization via fast gradient descent. Given a target property, the learned invertible model provides a posterior over the parameter space; we identify these posterior samples as an intelligent prior initialization which enables us to narrow down the search space. We then perform gradient descent to calibrate the inverse solutions within a local region. Meanwhile, a space-filling sampling is imposed on the latent space to better explore and capture all possible solutions. We evaluate our approach on three benchmark tasks and create two datasets of real-world applications from quantum chemistry and additive manufacturing and find our method achieves superior performance compared to several state-of-the-art baseline methods. The iPage code is available at https://github.com/jxzhangjhu/MatDesINNe.

#7 Attention-Conditioned Augmentations for Self-Supervised Anomaly Detection and Localization [PDF] [Copy] [Kimi]

Authors: Behzad Bozorgtabar ; Dwarikanath Mahapatra

Self-supervised anomaly detection and localization are critical to real-world scenarios in which collecting anomalous samples and pixel-wise labeling is tedious or infeasible, even worse when a wide variety of unseen anomalies could surface at test time. Our approach involves a pretext task in the context of masked image modeling, where the goal is to impose agreement between cluster assignments obtained from the representation of an image view containing saliency-aware masked patches and the uncorrupted image view. We harness the self-attention map extracted from the transformer to mask non-salient image patches without destroying the crucial structure associated with the foreground object. Subsequently, the pre-trained model is fine-tuned to detect and localize simulated anomalies generated under the guidance of the transformer's self-attention map. We conducted extensive validation and ablations on the benchmark of industrial images and achieved superior performance against competing methods. We also show the adaptability of our method to the medical images of the chest X-rays benchmark.

#8 Robust-by-Design Classification via Unitary-Gradient Neural Networks [PDF] [Copy] [Kimi]

Authors: Fabio Brau ; Giulio Rossolini ; Alessandro Biondi ; Giorgio Buttazzo

The use of neural networks in safety-critical systems requires safe and robust models, due to the existence of adversarial attacks. Knowing the minimal adversarial perturbation of any input x, or, equivalently, knowing the distance of x from the classification boundary, allows evaluating the classification robustness, providing certifiable predictions. Unfortunately, state-of-the-art techniques for computing such a distance are computationally expensive and hence not suited for online applications. This work proposes a novel family of classifiers, namely Signed Distance Classifiers (SDCs), that, from a theoretical perspective, directly output the exact distance of x from the classification boundary, rather than a probability score (e.g., SoftMax). SDCs represent a family of robust-by-design classifiers. To practically address the theoretical requirements of an SDC, a novel network architecture named Unitary-Gradient Neural Network is presented. Experimental results show that the proposed architecture approximates a signed distance classifier, hence allowing an online certifiable classification of x at the cost of a single inference.

#9 Ensemble-in-One: Ensemble Learning within Random Gated Networks for Enhanced Adversarial Robustness [PDF] [Copy] [Kimi]

Authors: Yi Cai ; Xuefei Ning ; Huazhong Yang ; Yu Wang

Adversarial attacks have threatened modern deep learning systems by crafting adversarial examples with small perturbations to fool the convolutional neural networks (CNNs). To alleviate that, ensemble training methods are proposed to facilitate better adversarial robustness by diversifying the vulnerabilities among the sub-models, simultaneously maintaining comparable natural accuracy as standard training. Previous practices also demonstrate that enlarging the ensemble can improve the robustness. However, conventional ensemble methods are with poor scalability, owing to the rapidly increasing complexity when containing more sub-models in the ensemble. Moreover, it is usually infeasible to train or deploy an ensemble with substantial sub-models, owing to the tight hardware resource budget and latency requirement. In this work, we propose Ensemble-in-One (EIO), a simple but effective method to efficiently enlarge the ensemble with a random gated network (RGN). EIO augments a candidate model by replacing the parametrized layers with multi-path random gated blocks (RGBs) to construct an RGN. The scalability is significantly boosted because the number of paths exponentially increases with the RGN depth. Then by learning from the vulnerabilities of numerous other paths within the RGN, every path obtains better adversarial robustness. Our experiments demonstrate that EIO consistently outperforms previous ensemble training methods with smaller computational overheads, simultaneously achieving better accuracy-robustness trade-offs than adversarial training methods under black-box transfer attacks. Code is available at https://github.com/cai-y13/Ensemble-in-One.git

#10 Safe Reinforcement Learning via Shielding under Partial Observability [PDF] [Copy] [Kimi]

Authors: Steven Carr ; Nils Jansen ; Sebastian Junges ; Ufuk Topcu

Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We extend the applicability of shields via tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.

#11 PowRL: A Reinforcement Learning Framework for Robust Management of Power Networks [PDF] [Copy] [Kimi]

Authors: Anandsingh Chauhan ; Mayank Baranwal ; Ansuma Basumatary

Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.

#12 Two Wrongs Don’t Make a Right: Combating Confirmation Bias in Learning with Label Noise [PDF] [Copy] [Kimi]

Authors: Mingcai Chen ; Hao Cheng ; Yuntao Du ; Ming Xu ; Wenyu Jiang ; Chongjun Wang

Noisy labels damage the performance of deep networks. For robust learning, a prominent two-stage pipeline alternates between eliminating possible incorrect labels and semi-supervised training. However, discarding part of noisy labels could result in a loss of information, especially when the corruption has a dependency on data, e.g., class-dependent or instance-dependent. Moreover, from the training dynamics of a representative two-stage method DivideMix, we identify the domination of confirmation bias: pseudo-labels fail to correct a considerable amount of noisy labels, and consequently, the errors accumulate. To sufficiently exploit information from noisy labels and mitigate wrong corrections, we propose Robust Label Refurbishment (Robust LR)—a new hybrid method that integrates pseudo-labeling and confidence estimation techniques to refurbish noisy labels. We show that our method successfully alleviates the damage of both label noise and confirmation bias. As a result, it achieves state-of-the-art performance across datasets and noise types, namely CIFAR under different levels of synthetic noise and mini-WebVision and ANIMAL-10N with real-world noise.

#13 Testing the Channels of Convolutional Neural Networks [PDF] [Copy] [Kimi]

Authors: Kang Choi ; Donghyun Son ; Younghoon Kim ; Jiwon Seo

Neural networks have complex structures, and thus it is hard to understand their inner workings and ensure correctness. To understand and debug convolutional neural networks (CNNs) we propose techniques for testing the channels of CNNs. We design FtGAN, an extension to GAN, that can generate test data with varying the intensity (i.e., sum of the neurons) of a channel of a target CNN. We also proposed a channel selection algorithm to find representative channels for testing. To efficiently inspect the target CNN’s inference computations, we define unexpectedness score, which estimates how similar the inference computation of the test data is to that of the training data. We evaluated FtGAN with five public datasets and showed that our techniques successfully identify defective channels in five different CNN models.

#14 Feature-Space Bayesian Adversarial Learning Improved Malware Detector Robustness [PDF] [Copy] [Kimi]

Authors: Bao Gia Doan ; Shuiqiao Yang ; Paul Montague ; Olivier De Vel ; Tamas Abraham ; Seyit Camtepe ; Salil S. Kanhere ; Ehsan Abbasnejad ; Damith C. Ranashinghe

We present a new algorithm to train a robust malware detector. Malware is a prolific problem and malware detectors are a front-line defense. Modern detectors rely on machine learning algorithms. Now, the adversarial objective is to devise alterations to the malware code to decrease the chance of being detected whilst preserving the functionality and realism of the malware. Adversarial learning is effective in improving robustness but generating functional and realistic adversarial malware samples is non-trivial. Because: i) in contrast to tasks capable of using gradient-based feedback, adversarial learning in a domain without a differentiable mapping function from the problem space (malware code inputs) to the feature space is hard; and ii) it is difficult to ensure the adversarial malware is realistic and functional. This presents a challenge for developing scalable adversarial machine learning algorithms for large datasets at a production or commercial scale to realize robust malware detectors. We propose an alternative; perform adversarial learning in the feature space in contrast to the problem space. We prove the projection of perturbed, yet valid malware, in the problem space into feature space will always be a subset of adversarials generated in the feature space. Hence, by generating a robust network against feature-space adversarial examples, we inherently achieve robustness against problem-space adversarial examples. We formulate a Bayesian adversarial learning objective that captures the distribution of models for improved robustness. To explain the robustness of the Bayesian adversarial learning algorithm, we prove that our learning method bounds the difference between the adversarial risk and empirical risk and improves robustness. We show that Bayesian neural networks (BNNs) achieve state-of-the-art results; especially in the False Positive Rate (FPR) regime. Adversarially trained BNNs achieve state-of-the-art robustness. Notably, adversarially trained BNNs are robust against stronger attacks with larger attack budgets by a margin of up to 15% on a recent production-scale malware dataset of more than 20 million samples. Importantly, our efforts create a benchmark for future defenses in the malware domain.

#15 Correct-by-Construction Reinforcement Learning of Cardiac Pacemakers from Duration Calculus Requirements [PDF] [Copy] [Kimi]

Authors: Kalyani Dole ; Ashutosh Gupta ; John Komp ; Shankaranarayanan Krishna ; Ashutosh Trivedi

As the complexity of pacemaker devices continues to grow, the importance of capturing its functional correctness requirement formally cannot be overestimated. The pacemaker system specification document by \emph{Boston Scientific} provides a widely accepted set of specifications for pacemakers. As these specifications are written in a natural language, they are not amenable for automated verification, synthesis, or reinforcement learning of pacemaker systems. This paper presents a formalization of these requirements for a dual-chamber pacemaker in \emph{duration calculus} (DC), a highly expressive real-time specification language. The proposed formalization allows us to automatically translate pacemaker requirements into executable specifications as stopwatch automata, which can be used to enable simulation, monitoring, validation, verification and automatic synthesis of pacemaker systems. The cyclic nature of the pacemaker-heart closed-loop system results in DC requirements that compile to a decidable subclass of stopwatch automata. We present shield reinforcement learning (shield RL), a shield synthesis based reinforcement learning algorithm, by automatically constructing safety envelopes from DC specifications.

#16 SafeLight: A Reinforcement Learning Method toward Collision-Free Traffic Signal Control [PDF] [Copy] [Kimi]

Authors: Wenlu Du ; Junyi Ye ; Jingyi Gu ; Jing Li ; Hua Wei ; Guiling Wang

Traffic signal control is safety-critical for our daily life. Roughly one-quarter of road accidents in the U.S. happen at intersections due to problematic signal timing, urging the development of safety-oriented intersection control. However, existing studies on adaptive traffic signal control using reinforcement learning technologies have focused mainly on minimizing traffic delay but neglecting the potential exposure to unsafe conditions. We, for the first time, incorporate road safety standards as enforcement to ensure the safety of existing reinforcement learning methods, aiming toward operating intersections with zero collisions. We have proposed a safety-enhanced residual reinforcement learning method (SafeLight) and employed multiple optimization techniques, such as multi-objective loss function and reward shaping for better knowledge integration. Extensive experiments are conducted using both synthetic and real-world benchmark datasets. Results show that our method can significantly reduce collisions while increasing traffic mobility.

#17 PatchNAS: Repairing DNNs in Deployment with Patched Network Architecture Search [PDF] [Copy] [Kimi]

Authors: Yuchu Fang ; Wenzhong Li ; Yao Zeng ; Yang Zheng ; Zheng Hu ; Sanglu Lu

Despite being widely deployed in safety-critical applications such as autonomous driving and health care, deep neural networks (DNNs) still suffer from non-negligible reliability issues. Numerous works had reported that DNNs were vulnerable to either natural environmental noises or man-made adversarial noises. How to repair DNNs in deployment with noisy samples is a crucial topic for the robustness of neural networks. While many network repairing methods based on data argumentation and weight adjustment have been proposed, they require retraining and redeploying the whole model, which causes high overhead and is infeasible for varying faulty cases on different deployment environments. In this paper, we propose a novel network repairing framework called PatchNAS from the architecture perspective, where we freeze the pretrained DNNs and introduce a small patch network to deal with failure samples at runtime. PatchNAS introduces a novel network instrumentation method to determine the faulty stage of the network structure given the collected failure samples. A small patch network structure is searched unsupervisedly using neural architecture search (NAS) technique with data samples from deployment environment. The patch network repairs the DNNs by correcting the output feature maps of the faulty stage, which helps to maintain network performance on normal samples and enhance robustness in noisy environments. Extensive experiments based on several DNNs across 15 types of natural noises show that the proposed PatchNAS outperforms the state-of-the-arts with significant performance improvement as well as much lower deployment overhead.

#18 Similarity Distribution Based Membership Inference Attack on Person Re-identification [PDF] [Copy] [Kimi]

Authors: Junyao Gao ; Xinyang Jiang ; Huishuai Zhang ; Yifan Yang ; Shuguang Dou ; Dongsheng Li ; Duoqian Miao ; Cheng Deng ; Cairong Zhao

While person Re-identification (Re-ID) has progressed rapidly due to its wide real-world applications, it also causes severe risks of leaking personal information from training data. Thus, this paper focuses on quantifying this risk by membership inference (MI) attack. Most of the existing MI attack algorithms focus on classification models, while Re-ID follows a totally different training and inference paradigm. Re-ID is a fine-grained recognition task with complex feature embedding, and model outputs commonly used by existing MI like logits and losses are not accessible during inference. Since Re-ID focuses on modelling the relative relationship between image pairs instead of individual semantics, we conduct a formal and empirical analysis which validates that the distribution shift of the inter-sample similarity between training and test set is a critical criterion for Re-ID membership inference. As a result, we propose a novel membership inference attack method based on the inter-sample similarity distribution. Specifically, a set of anchor images are sampled to represent the similarity distribution conditioned on a target image, and a neural network with a novel anchor selection module is proposed to predict the membership of the target image. Our experiments validate the effectiveness of the proposed approach on both the Re-ID task and conventional classification task.

#19 Out-of-Distribution Detection Is Not All You Need [PDF] [Copy] [Kimi]

Authors: Joris Guerin ; Kevin Delmas ; Raul Ferreira ; Jérémie Guiochet

The usage of deep neural networks in safety-critical systems is limited by our ability to guarantee their correct behavior. Runtime monitors are components aiming to identify unsafe predictions and discard them before they can lead to catastrophic consequences. Several recent works on runtime monitoring have focused on out-of-distribution (OOD) detection, i.e., identifying inputs that are different from the training data. In this work, we argue that OOD detection is not a well-suited framework to design efficient runtime monitors and that it is more relevant to evaluate monitors based on their ability to discard incorrect predictions. We call this setting out-of-model-scope detection and discuss the conceptual differences with OOD. We also conduct extensive experiments on popular datasets from the literature to show that studying monitors in the OOD setting can be misleading: 1. very good OOD results can give a false impression of safety, 2. comparison under the OOD setting does not allow identifying the best monitor to detect errors. Finally, we also show that removing erroneous training data samples helps to train better monitors.

#20 Contrastive Self-Supervised Learning Leads to Higher Adversarial Susceptibility [PDF] [Copy] [Kimi]

Authors: Rohit Gupta ; Naveed Akhtar ; Ajmal Mian ; Mubarak Shah

Contrastive self-supervised learning (CSL) has managed to match or surpass the performance of supervised learning in image and video classification. However, it is still largely unknown if the nature of the representations induced by the two learning paradigms is similar. We investigate this under the lens of adversarial robustness. Our analysis of the problem reveals that CSL has intrinsically higher sensitivity to perturbations over supervised learning. We identify the uniform distribution of data representation over a unit hypersphere in the CSL representation space as the key contributor to this phenomenon. We establish that this is a result of the presence of false negative pairs in the training process, which increases model sensitivity to input perturbations. Our finding is supported by extensive experiments for image and video classification using adversarial perturbations and other input corruptions. We devise a strategy to detect and remove false negative pairs that is simple, yet effective in improving model robustness with CSL training. We close up to 68% of the robustness gap between CSL and its supervised counterpart. Finally, we contribute to adversarial learning by incorporating our method in CSL. We demonstrate an average gain of about 5% over two different state-of-the-art methods in this domain.

#21 AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning [PDF] [Copy] [Kimi]

Authors: Tairan He ; Weiye Zhao ; Changliu Liu

Safety is a critical hurdle that limits the application of deep reinforcement learning to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision process. However, constrained methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safety benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs to a Lagrangian-based policy learner and a constrained-optimization policy learner with baseline agents that use the same policy learners but with only extrinsic costs. Results show that the converged policies with intrinsic costs in all environments achieve zero constraint violation and comparable performance with baselines.

#22 Test Time Augmentation Meets Post-hoc Calibration: Uncertainty Quantification under Real-World Conditions [PDF] [Copy] [Kimi]

Authors: Achim Hekler ; Titus J. Brinker ; Florian Buettner

Communicating the predictive uncertainty of deep neural networks transparently and reliably is important in many safety-critical applications such as medicine. However, modern neural networks tend to be poorly calibrated, resulting in wrong predictions made with a high confidence. While existing post-hoc calibration methods like temperature scaling or isotonic regression yield strongly calibrated predictions in artificial experimental settings, their efficiency can significantly reduce in real-world applications, where scarcity of labeled data or domain drifts are commonly present. In this paper, we first investigate the impact of these characteristics on post-hoc calibration and introduce an easy-to-implement extension of common post-hoc calibration methods based on test time augmentation. In extensive experiments, we demonstrate that our approach results in substantially better calibration on various architectures. We demonstrate the robustness of our proposed approach on a real-world application for skin cancer classification and show that it facilitates safe decision-making under real-world uncertainties.

#23 Robust Training of Neural Networks against Bias Field Perturbations [PDF] [Copy] [Kimi]

Authors: Patrick Henriksen ; Alessio Lomuscio

We introduce the problem of training neural networks such that they are robust against a class of smooth intensity perturbations modelled by bias fields. We first develop an approach towards this goal based on a state-of-the-art robust training method utilising Interval Bound Propagation (IBP). We analyse the resulting algorithm and observe that IBP often produces very loose bounds for bias field perturbations, which may be detrimental to training. We then propose an alternative approach based on Symbolic Interval Propagation (SIP), which usually results in significantly tighter bounds than IBP. We present ROBNET, a tool implementing these approaches for bias field robust training. In experiments networks trained with the SIP-based approach achieved up to 31% higher certified robustness while also maintaining a better accuracy than networks trained with the IBP approach.

#24 Redactor: A Data-Centric and Individualized Defense against Inference Attacks [PDF] [Copy] [Kimi]

Authors: Geon Heo ; Steven Euijong Whang

Information leakage is becoming a critical problem as various information becomes publicly available by mistake, and machine learning models train on that data to provide services. As a result, one's private information could easily be memorized by such trained models. Unfortunately, deleting information is out of the question as the data is already exposed to the Web or third-party platforms. Moreover, we cannot necessarily control the labeling process and the model trainings by other parties either. In this setting, we study the problem of targeted disinformation generation where the goal is to dilute the data and thus make a model safer and more robust against inference attacks on a specific target (e.g., a person's profile) by only inserting new data. Our method finds the closest points to the target in the input space that will be labeled as a different class. Since we cannot control the labeling process, we instead conservatively estimate the labels probabilistically by combining decision boundaries of multiple classifiers using data programming techniques. Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach is effective in defending against inference attacks and can scale to large data.

#25 Improving Adversarial Robustness with Self-Paced Hard-Class Pair Reweighting [PDF] [Copy] [Kimi]

Authors: Pengyue Hou ; Jie Han ; Xingyu Li

Deep Neural Networks are vulnerable to adversarial attacks. Among many defense strategies, adversarial training with untargeted attacks is one of the most effective methods. Theoretically, adversarial perturbation in untargeted attacks can be added along arbitrary directions and the predicted labels of untargeted attacks should be unpredictable. However, we find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs become virtual targets of each other. This study investigates the impact of such closely-coupled classes on adversarial attacks and develops a self-paced reweighting strategy in adversarial training accordingly. Specifically, we propose to upweight hard-class pair losses in model optimization, which prompts learning discriminative features from hard classes. We further incorporate a term to quantify hard-class pair consistency in adversarial training, which greatly boosts model robustness. Extensive experiments show that the proposed adversarial training method achieves superior robustness performance over state-of-the-art defenses against a wide range of adversarial attacks. The code of the proposed SPAT is published at https://github.com/puerrrr/Self-Paced-Adversarial-Training.